A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees
نویسندگان
چکیده
منابع مشابه
Multiple imputation for missing data via sequential regression trees.
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regres...
متن کاملHandling Missing Data in Trees: Surrogate Splits or Statistical Imputation
In many applications of data mining a sometimes considerable part of the data values is missing. This may occur because the data values were simply never entered into the operational systems from which the mining table was constructed, or because for example simple domain checks indicate that entered values are incorrect. Despite the frequent occurrence of missing data, most data mining algorit...
متن کاملMultiple Imputation for Missing Data
Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin’s (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard proc...
متن کاملPractice of Epidemiology Multiple Imputation for Missing Data via Sequential Regression Trees
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regres...
متن کاملA nonparametric multiple imputation approach for missing categorical data
BACKGROUND Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. METHODS We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each catego...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Behavioral Data Science
سال: 2021
ISSN: 2575-8306,2574-1284
DOI: 10.35566/jbds/v1n1/p6